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Abstract — Distributed estimation based on measurements from 
multiple wireless sensors is investigated. It is assumed that a 
group of sensors observe the same quantity in independent 
additive observation noises with possibly different variances. The 
observations are transmitted using amplify-and-forward (analog) 
transmissions over non-ideal fading wireless channels from the 
sensors to a fusion center, where they are combined to generate 
an estimate of the observed quantity. Assuming that the Best 
Linear Unbiased Estimator (BLUE) is used by the fusion center, 
the equal-power transmission strategy is first discussed, where 
the system performance is analyzed by introducing the concept 
of estimation outage and estimation diversity, and it is shown 
that there is an achievable diversity gain on the order of the 
number of sensors. The optimal power allocation strategies are 
then considered for two cases: minimum distortion under power 
constraints; and minimum power under distortion constraints. 
In the first case, it is shown that by turning off bad sensors, 
i.e., sensors with bad channels and bad observation quality, 
adaptive power gain can be achieved without sacrificing diversity 
gain. Here, the adaptive power gain is similar to the array gain 
achieved in Multiple-Input Single-Output (MISO) multi-antenna 
systems when channel conditions are known to the transmitter. 
In the second case, the sum power is minimized under zero- 
outage estimation distortion constraint, and some related energy 
efficiency issues in sensor networks are discussed. 

Index Terms — Estimation outage, estimation diversity, dis- 
tributed estimation, energy efficiency. 

I. Introduction 

Wireless Sensor Networks (WSNs) deploy geographically- 
distributed sensor nodes to collect information of interest. The 
collected information is usually aggregated via wireless trans- 
missions at a fusion center to generate the final intelligence. A 
typical wireless sensor network, as shown in Fig. [T] consists 
of a fusion center and a number of sensors. The sensors 
typically have limited energy resources and communication 
capability. Each sensor in the network makes an observation 
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of the quantity of interest, generates a local signal (either 
analog or digital), and then sends it to the fusion center where 
the received sensor signals are combined to produce a final 
estimate of the observed quantity. Sensor networks of this 
type are suited for various applications such as environmental 
monitoring and smart factory instrumentation. 



O 




Fig. 1. Sensor network with a fusion center. 



There has been a long history on the study of disttibuted 
estimation. Examples of early work include the study in the 
context of distributed conttol [1], [2], tracking [3], or data 
fusion [4], [5]. Recently, many new results appear in the WSN 
community with a focus on distributed data fusion, where 
the most commonly used network fusion model is the one 
where each sensor processes its individual measurement and 
transmits the result over a Multiple Access Channel (MAC) to 
the fusion center. From an information-theoretic perspective, 
[6], [7], [8], [9], [10] investigate the mean squared estimation 
error performance versus ttansmit power for the quadratic 
CEO problem with a coherent MAC. Notably, it is shown in 
[6] that if the sensor statistics are Gaussian, a simple uncoded 
(analog-and-forwarding) scheme dramatically outperforms the 
separate source-channel coding approach and leads to an opti- 
mal asymptotic scaling behavior. The uncoded communication 
scheme is further proved to preserve the optimal scaling law 
in [9] for sensor networks with node statistics satisfying a 
certain mean condition, while the source-channel matching 
result is extended to more general homogeneous signal fields 
in [10]. If the sensor measurements are not continuous but 
in a finite alphabet, type-based transmission schemes are 
proposed in [11], [12]. The many-to-one transport capacity 
and compressibility are investigated for dense joint estimation 
sensor networks in [13]. When a full coordination among 
sensors is unavailable and the underlying communication 
links are not reliable, the distributed estimation problem is 
investigated in [14], where an information-theoretic achievable 
rate-distortion region is elegantly derived. The work in [15] 
studies the in-network processing approaches based on a 
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hierarchical data-handling and communication architecture for 
the estimation of field sources. In addition, by assuming only 
local sensor information exchange, [16] proposes a distributed 
algorithm for reaching network-wide consensus. 

Among most of the existing studies, it is usually assumed 
that the joint distribution of the sensor observations is known. 
However, in some practical systems the probability density 
function (pdf) of the observation noise is hard to characterize, 
especially for a large scale sensor network. This motivates us 
to devise universal signal processing algorithms that do not 
require the knowledge of the observation noise distributions. 
Recently, universal Decentralized Estimation Schemes (DESs) 
are proposed in [17] and [18]. In [17], the author considers 
the universal distributed estimation in a homogeneous sensor 
network where sensors have observations of the same quality, 
while in [18], the universal DES in an inhomogeneous sensing 
environment is considered. These proposed DESs require each 
sensor to send to the fusion center a short discrete message 
with length determined by the local Signal to Noise Ratio 
(SNR), which then guarantees that the performance is within 
a constant factor of that achieved by the Best Linear Unbiased 
Estimator (BLUE). An assumption in these proposed schemes 
is that the channels between the sensors and the fusion center 
are perfect, i.e., all messages are received by the fusion center 
without any distortion. However, due to power limitations, 
fading, and channel noise, the signal sent by each individual 
sensor to the fusion center may be corrupted. Therefore, the 
transmission system for the joint estimation scheme should 
be designed to minimize the end-to-end distortion subject to 
certain transmit power constraints, under a practical wireless 
channel model considering both fading and additive noises. 
In this paper we show that in a fading wireless environment, 
multiple sensor nodes are not only necessary for generating 
multiple observations to reduce distortion, but also crucial to 
achieve a certain degree of diversity that minimizes the effects 
of fading during signal transmissions. 

If the sensor observation is in analog form, we have two 
main options to transmit the observations from sensors to 
the fusion center: analog or digital. For the analog approach, 
the observed signal is transmitted via analog modulation to 
the fusion center, which we refer to as the amplify-and- 
forward approach. In the digital approach, the observed signal 
is digitized into bits, possibly compressed and/or encoded, 
then digitally modulated and transmitted. It is well known 
([19], [20]) that for a single Gaussian source with an Additive 
White Gaussian Noise (AWGN) channel, both the digital and 
analog approaches can retain the optimal power-distortion 
tradeoff. Also for the estimation of a Gaussian source with 
a coherent Gaussian MAC, it is shown in [6], [9], [10] that 
the analog forwarding schemes outperform (or are as good 
as) the digital approaches and have the optimal asymptotic 
scaling behavior. For sources with general distributions, type- 
based (each sensor transmits the local type or histogram of its 
data in an analog fashion over a MAC) parametric estimation 
schemes are proposed in [9], [11], [12]. In the above papers, 
the impact of coherent MAC schemes on the joint source- 
channel optimality is discussed. 

In this paper, instead of assuming a coherent MAC, we 



adopt orthogonal channels between the sensors and the fusion 
center. The main motivation for using orthogonal multiple 
access schemes such as Frequency Division Multiple Access 
(FDMA) is the removal of the requirement on the carrier- 
level synchronization among sensors (we still require pair- 
wise synchronization between each sensor and the fusion 
center). We assume that the observed signal is analog and 
the observation noises are uncorrelated across sensors. In 
addition, we assume that the second moments of the signal and 
noise are known to the corresponding sensor and the fusion 
center. The fusion center deploys the best linear unbiased 
estimator to generate estimates of the unknown signal. In 
this setting, we investigate an analog transmission system 
where observations are amplified and forwarded to the fusion 
center. We first analyze the system performance in fading 
channels by introducing the concept of estimation diversity. 
We investigate the diversity gain that is achievable in a slow 
fading environment, where it is assumed that the transmission 
between sensors and the fusion center experiences i.i.d. fading 
factors together with AWGNs. An outage is claimed if the end- 
to-end distortion is larger than a certain threshold. In this case, 
we show that using multiple sensors can achieve diversity to 
enhance the outage performance, where the diversity order 
is equal to the total number of sensors. We then find the 
optimal power allocation strategy for the case where the end- 
to-end distortion is minimized under certain transmit power 
constraints. The result leads to turning off certain sensors with 
bad channels and bad observation quality. By doing so, the 
achievable diversity order is not reduced and extra adaptive 
power gain is obtained. We finally investigate the converse 
problem to minimize the total power consumption under a 
certain distortion constraint. 

The rest of the paper is organized as follows. Section II 
discusses the system model. Section III analyzes the distortion 
performance of an equal-power transmission strategy, where 
the concept of estimation diversity is introduced. Section IV 
addresses the case where the transmission power is allocated in 
an optimal way to minimize the distortion. Section V focuses 
on the converse case where power is minimized subject to a 
distortion constraint. Section VI summarizes the results and 
presents our conclusions. 

II. System Model 

We assume a sensor network with K sensors where the 
observation Xk(t) at sensor k is represented as a random signal 
8(t) corrupted by observation noise rik(t): Xk{t) = 9{t) + 
nk{t), t = 0, 1, 2, • • • . We also assume that both 6(t) and nk(t) 
are i.i.d. over time t. Each sensor transmits the signal Xk(t) 
to the fusion center where 0{t) is estimated from the received 
version of Xfc(i)'s, k = 1, • • • , K. We further assume that 9{t) 
and nk(t) have zero mean and second moments and a\ 
respectively, based on which we define the local observation 
SNR for sensor k as: jk = ^g/^k- 

We assume that K sensors transmit their observations to 
the fusion center via K orthogonal channels (FDMA), where 
different channels experience independent fading factors and 
zero-mean AWGNs. Specifically, for channel k, we assume 
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i.i.d. (over t) block fading with the channel power gain denoted 
as gk(t), and i.i.d. (over t) AWGNs denoted as n c k(t) of 
variance k = 1, • • • , K, where the variances are assumed to 
be the same for all fc's in this paper. We also assume pair-wise 
synchronization between each sensor and the fusion center. 
However, synchronization among sensor nodes is not required. 
At each sensor transmitter, we adopt an analog amplify and 
forward uncoded strategy, motivated by the results derived 
in [20]. Therefore, at sensor k, the transmitter can be simply 
modeled by a power amplifying factor a k (t) and the average 
transmit power is given as 

Pk = a k P Xk = a k (a 2 g + o\) = a' k (l + 7^) (1) 

where P Xk is the average power of x k (t) and a' k = a k cr 2 . 
Note that we only need to consider the power gains (no phase 
information is needed) in both the transmitter and the channel, 
based on the assumptions that only the amplitude of 9(t) 
is estimated and coherent reception (the effect of phase is 
eliminated due to synchronization) is performed in the fusion 
center for each Xk(t). 

Given the assumption of system independence over time t, 
we can analyze the system performance by first focusing on 
an arbitrary time snapshot, and then apply the result (which 
is conditional on one system realization) to analyze the long- 
term average and outage performance in the later sections. 
Therefore, from now on we neglect the time index t in all 
the parameters. The overall system structure at one snapshot 
is shown in Fig. [2] 




yi, ■■ ■ ,yx 



Fig. 2. Amplify and forward 

The received signal vector is given by 

y = he + v, 



(2) 



where 

y = [2/1,2/2, 



v = [VaiSi^i + n-d, • • • , *JoL K g K n k + n cK ]^ , 

with f denoting transposition. 

Since we intend to make the estimator universal (indepen- 
dent of particular observation noise distributions except the 
second-order statistics) and simple, the BLUE [21] is adopted 
at the fusion center. Accordingly, the estimate for 9 conditional 
on a given set of channel gains is given by 

9 = [r^R ^h] VR V 



a k g k 



K 

E 

fe=i 



V«fc.9fc2/fc 



(3) 



where the noise variance matrix R is a diagonal matrix with 
Rkk = o\uk9k + £f. k = 1, ■ ■ ■ , K. 

The Mean Squared Error (MSE) of this estimator is given 
as [21] 



Var[0] 



[l^R- 1 !!]- 1 




(4) 



where for notational convenience, we introduce the channel 
SNR: s k = g k /e k , k = 1,--- ,K. 

We now summarize the notations that we have defined so 
far (for a particular time snapshot). 

• 9 and a^: Signal to estimate and its variance; 

• rik and a 2 : Observation noise and the associated variance 
at sensor fc; 

• Xk' Observation signal at sensor fc; 

• a k : Power amplifying factor at sensor fc; In addition, 
a' k := a k a 2 e ; 

• g k : Power gain of channel fc; 

• n ck and Zero-mean AWGN of channel fc and its 
variance, with £| the same for all fc's; 

• Ik '■— (J 'al (J \'- Local observation SNR at sensor fc; 

• s k 



III. 



9k/£, k - SNR of channel fc. 



EQUAL-POWER ALLOCATION: ESTIMATION 
DIVERSITY 



Given the proposed joint estimation system, we are inter- 
ested in investigating how the overall distortion performance 
is affected by the fact that we have multiple sensors with 
independent fading channels. We first investigate how the 
average distortion scales with the number of sensors in the 
network, and secondly, we quantify how the reliability of 
the overall estimation system is enhanced as we increase 
the number of sensors given independent observations and 
independent fading channels across different sensors. 

To assure fairness when we compare different systems with 
different numbers of sensor nodes, we fix the total transmission 
power that the K nodes can use, denoted as P tot . In this 
section, we consider the case of equal-power allocation where 
all sensors transmit the same amount of power. As the total 
power budget for all sensors is Ptot, according to Eq. (03, we 
have 

1 Ptot 

a k - 



1< fc < K. 



m+ik 1 ) 

According to Eq. (01, the achieved MSE is 



Var[0] = a 2 PT 



PtotSk 



—r ■ (5) 

xfi^PtotSk + Kil + Jk 1 )) 

We assume that the channels between sensors and the 
fusion center experience channel gain g k s, which are i.i.d. 
over fc, and the sensors have different observation noises 
with random variances er^'s that are also i.i.d. over fc. The 
i.i.d. assumption on er^'s can be justified if we assume that 
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the sensors are randomly deployed into the field and the 
different measurement noise variances are caused by different 
observation distances. With these assumptions, we observe 
that both 7fc's and s^'s are i.i.d. over k, and without loss 
of generality we have E{^k) = ^(71) and E(sk) = E(s±), 
k = 1, ■ ■ ■ , K. 

Our first question is: Suppose that the total power P to t is 
fixed; what is the asymptotic behavior of V&r[Q] when the total 
number of sensors K increases without bound! 

When Sfc's and 7fc's are random and i.i.d. over k, we have 

2 

lim Var[01 = — ^ ttt : = Ax,, (6) 

P tot E[ Sl /(l + 1 - 1 )] 

for which the derivation is given in Appendix A. 

From the result in Eq. © and the corresponding derivation 
given in Appendix A, we conclude the following: 

• With a finite amount of total transmit power P tot for 
all the sensors, the overall MSE Var[6>] does not de- 
crease to zero even if K approaches infinity. This is a 
consequence of using orthogonal links from the sensors 
to the fusion center, which leads to K different channel 
noises such that the corruption of channel noise cannot 
be eliminated even when K goes to infinity. Systems 
based on non-orthogonal multiple access schemes are 
discussed in [25], [26], where it is shown that with finite 
total transmit power, Var[0] can be driven to zero when 
K goes to infinity. However, in those systems perfect 
carrier synchronization among all the sensor nodes and 
full channel knowledge (amplitude gain and phase shift) 
at the transmitters are required, which may not be feasible 
in practical systems. 

• Although Var[0] is bounded away from zero, it decreases 
monotonically with K. However, the reduction in distor- 
tion with each additional sensor decreases as K gets large 
(c.f. Eq. (|23); 

• When K is large, Var[0] is inversely proportional to Ptot- 
Thus, when K increases, if Ptot also increases (at any 
speed) with K, we have = 0. 

This analysis suggests that when the total amount of power 
Ptot is fixed, even though the total number of sensors K 
increases without bound, the achieved average distortion at 
the fusion center does not decrease below a certain level . 
However, are there any benefits of having more sensors in the 
network if we limit the amount of total power? To answer this 
question, let us define the outage probability Vd to model 
the system reliability as follows, 



V Do = Prob{Var[#] > D }, 



(7) 



where Dq is a predefined threshold. Given the i.i.d. nature 
of Sfc's and 7/c's, the probability of Var[#] > Do at one 
particular snapshot is an appropriate indicator of the long- 
term estimation system reliability. The following theorem 
summarizes the relationship among Vd > Ptot, and the number 
of sensors K. 

Theorem 3.1: Suppose sensor observation SNR {7^ : k — 
1, 2, . . . , K} and channel SNR {s k : k = l,2,...,K} are 
both i.i.d. across k. Define r]k '•= Sfe/(1 +7?T 1 )- I n addition, 



we assume that E[rjk\ and £[7^ s?] are finite. When the total 
number of sensors K is large, with the total power P to t and 
equal-power allocation among sensors, we have the achievable 
average distortion 

„2 



K 



lim Var[0] 



PtotE[r,Y 

Moreover, for a sufficiently large but finite K and Dq > -DoJH 
we have the outage probability (c.f. Eq. (jTj) 

V Do ~ exp(-KI v (a)), or - \ogV Do ~ KI v {a), 

where ~ means asymptotically converging to (as K becomes 
large), 77 is the common distribution of r/h, a = 0$ / (DoPtot), 
and I v (a) is the rate function of rj: 

I v (a) =sup(0a-logM„(0)), 

with A1 V (0) the moment generating function of 77. 

A more detailed explanation of the rate function and the 
proof of Theorem 13. II are given in Appendix B. 

From the theorem we see that K plays the role of estimation 
diversity order here, in that the outage probability decreases 
exponentially with K. We remark that the fact that the outage 
probability decays exponentially with the number of sensors 
is due to the effect of independent measurements and fading 
coefficients, which bears similar properties as the probability 
of detection error in distributed detection [22], [23], [12], 
[24]. Note that even though Theorem 13.11 is an asymptotic 
result for large K, we later show by simulation results that 
the outage probability curve illustrates diversity order of 
K (approximately) even for small values of K in practical 
scenarios. 

As an example, let us consider the case in which 7^ = 1 
for fe's and ^J~Sk is i.i.d. Rayleigh with pdf 

„2 



Then 77^ = Sfc/2 has exponential distribution with pdf 
1 



fv( x ) 



S 2 



exp 



{-£}■ 



Therefore, E[r)k] = S 2 . Thus the achieved asymptotic distor- 
tion when K is large is given by 



lim Var[0] 



k^°° L ' P to tE[r] k } PtotS 2 ' 
Now we calculate the rate I v (a). It is easy to see that the 
moment generating function of an exponentially distributed 
random variable with mean b is given as 

M{9) = ' 



Thus 



I v (a) 



I -b9 



sup [Ga + log(l - b6)} 
0eR 



a a 
- b - l0g b 



- 1 

-log- 



-1, 



S 2 D P tot o-u^rtot 
'For the other cases of Do < -Doo, it is easy to see that Vd —* !• 
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where in the last step, we substituted the expressions a = 
ag/(DoPt ot ) and b = 8 2 . When P to t is large, i.e., when 



S 2 D P U 



<C 1, this leads to 



I v (a) 



S 2 D P tot 

log P tot , 



log 



S 2 D P tc 



which means that the estimation convergence rate is approx- 
imately logP iot when ^ ,_, — C 1 is satisfied. In other 



S 2 D a P tot 

words, for Rayleigh fading channels we have 

-log 7% ~ K log P tot , 



(8) 



which shows that the diversity order K is the slope of the 
outage probability vs. power curve if things are plotted in the 
log-log fashion. 

We now provide some numerical examples to verify the 
analytical results. We assume that the channel SNR is given 
as Sk = TTjrkfcl 2 where dk is the transmission distance from 
sensor fc to the fusion center, Go = — 30 dB is the nominal gain 
at dk = 1 m, and the |rfc|'s are i.i.d. Rayleigh fading random 
variables with unit variance. We take = —90 dBm, k = 
1, • • • , K . To emphasize the possible diversity gain enabled 
by the independent channel fading values, we set dk = 100 m 
and a 2 = 0.01 for all k. In addition, we set ai = 1. The 
outage threshold Dq is set as Dq = 2a\ = 0.02. 

The end-to-end distortion performance, averaged over ran- 
dom channel gains, is plotted in Fig.[3]for different numbers of 
sensors, where each point is a sample average over one million 
independent random channel samples. It is not surprising that 
the average distortion decreases as we increase the total power 
budget. Note that the average distortion barely improves when 
we increase the number of nodes from 3 to 30, which matches 
the comments given after Eq. ©. However, this does not mean 
that the 3-node case performs as well as the 30-node case, 
since the average performance is not a good criterion to use 
in a slow fading environment, where the outage performance 
is more informative. 



The outage probability versus the total transmission power 
is plotted in Fig.|4]for different numbers of sensors, where we 
see that the 3-node case performs much better than the 1-node 
case and the 9-node case performs much better than the 3-node 
case. Approximately, when the logarithm of outage probability 
is plotted versus the logarithm of total transmission power, the 
slope of the curve at the high power region is proportional to 
the number of sensors, which is defined as the diversity order 
in Eq. ((H). Note that this definition of diversity order is based 
on the distortion outage performance, which is different from 
the traditional definition of diversity order in Multiple-Input 
Multiple-Output (MIMO) multi-antenna systems [31], which 
is usually the slope of symbol error curves. However, the two 
definitions imply similar performance benefits from diversity. 
This type of diversity gain is also shown for large numbers 
of sensors in Fig. [5] where we see that for the same Do = 
0.02 threshold, the slope of the 20-node curve is twice that of 
the 10-node curve in the high power region. Not surprisingly, 
when we decrease D (down to 0.015 as shown in Fig. [5]), the 
outage probability will be increased. It is worth mentioning 
that since decreases with Ptot, when P to t increases, a 
fixed Do becomes progressively conservative as it gets further 
away from D^. As such, a more appropriate definition for the 
outage probability may be P e — Prob{Var[6*] > (1 + e)D 00 } 
(as pointed out by one of the reviewers), which is definitely 
worth further investigation, but beyond the scope of this paper. 
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Fig. 3. Average Distortion vs. Total Power 
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IV. 



Fig. 4. Outage Probability vs. Total Power 



Optimal power Allocation: Diversity Gain + 
Power Gain 



In the previous section we showed that diversity gain can 
be achieved even if we use a simple uniform power allocation 
scheme. In this section, we optimize power allocation among 
the sensors to minimize the total distortion. The diversity 
performance is analyzed and we show that a certain adaptive 
power gain can be achieved by optimal power control. To 
clarify the analysis, we first discuss the problem with only 
a sum power constraint, then discuss the general case with 
both sum and individual power constraints. 
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Fig. 5. Outage Probability vs. Total Power for Large Numbers of Sensors 



A. Optimal power allocation with a sum power constraint 

With a sum power constraint, the minimum distortion joint 
estimation problem for each given set of channel gains can be 
cast as 



min Var[#] 

A 

s. 

fe=i 



where P to t is the total power constraint across all the nodes. 
With Eqs. ([U and we can rewrite the above problem as 



A' 



G(o4;A ,Aifc) = -J2 -i°f k I -i 
~ 7fc a' k s k + 1 



-A [Ptot- X>' fc (l 



•7* 



k=l 



A 

E 

k=l 



H k a k 



which leads to the following Karush-Kuhn-Tucker (KKT) 
conditions [29]: 

s -i 

- - : ' fc 7-2 +A (1+7,T 1 )-^ = 0, Vfc, 

(7i < + s fc ) 

A 

Xy*(i+ 7 fc i )-p tot = o, 
fc=i 

/i fc a' fc = Vfc, 

a' fe > Vfc. 
From the first equation in the above set we obtain 

which leads to the solution 



Sk K^k'iui+iu 1 )-^) 

Also we can see from the third equation that for those sensors 
with a' k > (i.e., P k > 0), fi k = holds. Therefore, 



min y y 

A 

s.t. ^4(l+ 7 - 1 ) <P tot 

a' fc >0, fc=l,---,K 

Our goal is to obtain the optimal power allocation, i.e., 
optimal a' k s. To simplify the objective function, we rewrite 
the problem as 



A 



a k s k 



h 7^v fcSfe + 



1 



A 



s.t. ]To4(i + ik r ) <p t ot, 

fc=l 

a' fc >0, k = l,->-,K. 



(9) 



For nonnegative a' k , it can be shown that the second derivative 
with respect to a' k of each item in the objective function is 
nonnegative. Since the objective function is also separable over 
a' k (no coupled terms over different fc's), it is jointly convex 
over all the a' k 's. In addition, all the constraints are linear 
constraints. Thus, the problem is convex. 

Now we solve the problem in Eq. Its Lagrangian G is 



It 

Sk 



Sk \V 



,Vfc, 



(10) 



where (x) + equals when x is less than zero, and otherwise 
equals x. The first equality follows from the fact that if a' k > 0, 
/ife = 0, and if a' k = 0, then removing fj, k results in the 
difference within the bracket being non-positive. 

The Lagrangian multiplier A in Eq. ( fTOb and the number 
of active sensors (that are assigned non-zero power) can 
be uniquely determined from the power constraint by the 
following two-step derivation. 

• First, let us assume that only the first K\ sensors 
are active such that Ao can be solved by substituting 
[a[ , ■ ■ ■ , a' Ki ] back to the second KKT condition. This 
assumption can be guaranteed by ranking the sensors 
(according to j] k that is a function of both the observation 
SNR and channel SNR) such that 

m > V2 > ■ ■ ■ > vk, (ii) 

and the fact that a' k = if r\ k < A . As such, we obtain 
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where for any 1 < k < K, 



A(k) 



B{k) 



E 



\ - 7-. 



— + Pu 



Secondly, we substitute Ao back to Eq. (fTOb and solve the 
cutoff index K\, which is obviously determined by the 
relative magnitudes between and 1. Naturally, we 

introduce the notation: 



/(*) 




A(k) 



1, for 1 < k < K. (12) 



It follows from Eqns. (TToT > and (fT2l that solving the 
threshold K\ is equivalent to finding K\ such that 
f{Kx) > and f(K x + 1) < 0. Using the same 
techniques as in [30, Appendix B], we can show that 
such a Ki is unique and always exists unless f(k) > 
for all 1 < k < K , in which case we take K\ = K that 
means all sensors being active. 
Hence, it follows from Eq. ( fTOb that 



/ opt 



I Sk 



k>K x 
k<K t , 



(13) 



where cq 



B(Kx)/A{K\). It is easy to see that c 



defines the threshold of the rjk's {i.e., r\k > 1/cq), by which 
we can decide whether a sensor transmits or keeps silent. Note 
that the figure of merit r\k — Sk/ (1 +1^) I s jointly defined by 
the channel SNR and local observation SNR. For sensors with 
low r/k, they are completely shut off and no power is wasted. 
For the remaining active sensors, power should be assigned 
according to Eq. (fT3l >. 

To implement the described optimal power scheduling 
scheme, the fusion center needs to calculate and broadcast the 
threshold Ao to all the sensors. Each sensor then decides the 
optimal transmit power according to its own local information 
(7t- and Sk) and Ao. Such a power scheduling scheme is 
feasible when there exists a feedback broadcast channel (of 
low rate) from the fusion center to each sensor and the channel 
changes slowly. 

Furthermore, according to Eq. (|4j, the total distortion is 
given by 

-l 

Var[#] = a 2 e ' ^ ' ' ' 





coV^fc. 

and the outage probability can be rewritten as 

1 



(14) 



'-Do 



Prob 




< 



(15) 



To obtain closed-form representations for the outage prob- 
ability is difficult. However, we can numerically evaluate the 
performance for the optimal power transmission schemes, and 



compare it with the closed-form solution developed for the 
equal-power case in the previous section. Since equal-power 
allocation is just one feasible solution of the optimization 
problem in Eq. (O, the resulting optimal solution (which 
may turn off bad sensors) leads to strictly-lower distortion 
than the equal-power allocation strategy. Given that we have 
theoretically shown that the equal-power allocation strategy 
achieves full estimation diversity (on the order of K), we can 
state that the optimal power allocation strategy performs at 
least equally well, i.e., achieves full estimation diversity. This 
is further illustrated by the following simulation results. 

We assume that the related system parameters are set the 
same as in the equal-power case in Section [ill] In Fig. [6] 
we plot the percentage of active sensors versus the total 
transmission power, where we set K = 100 in the simulation. 
We note that the number of active sensors can be less than K 
when the total power budget is small. In Fig. |7J we compare 
the outage performance of the optimal power scheme with the 
case where all the sensors transmit with equal power. From 
the outage probability curves, we see that for the same number 
of sensor nodes the curve slopes are almost the same for both 
the equal-power and the optimal power cases, which means 
that the optimal power transmission strategy achieves the same 
diversity order of K. In addition, the curve for the optimal 
power case is a left-shifted version of that for the equal-power 
case. This shift is a result of adaptive power gain that is due 
to the optimal power control. This gain is similar to array or 
coding gain in traditional MTMO systems [31]. 



100 




10 



Total Power P , in Watts 

tot 



Fig. 6. Percentage of Active Sensors vs. Total Power 



B. Optimal power allocation with both sum and individual 
power constraints 

In the optimization problem of Eq. (O, the sum power 
constraint is imposed to guarantee a fair comparison when 
we change the number of sensor nodes. In some application 
scenarios, this sum power constraint has a physical meaning. 
For example, let us assume that there are multiple clusters of 
sensors, where each cluster is performing a different observa- 
tion task. If different clusters are sharing the same frequency 
band to transmit information, the total power emitted from 
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Fig. 7. Outage Probability Comparison vs. Total Power 



each cluster must be limited to enable the coexistence of 
multiple clusters. In addition, a more severe power constraint 
may be imposed on each individual sub-band used by each 
sensor for better frequency reuse, which is modeled by in- 
dividual power constraints for all the sensors. Note that the 
individual power constraint may also be imposed by power 
supply characteristics at each node. 

Given these considerations, we now cast an optimization 
model with both individual and sum power constraints. The 
optimization problem then becomes 

K i 

a k s k 



mm 
s. t. 



E 



A" 



Ed- 

k=l 

a' k > 



7 fe X K 



<Pt, 



(16) 



k = 1 



>rnax 
k > 



k = 1, 



,K 



K 



where p™ ax is the maximum allowable transmit power for 
node k. By combining the last two sets of constraints, the 
problem can be simplified to 



s. t. 



K 

-E 

fe=i 

k=l 



a k s k 



7 l a' k s k 



Ik 1 ) 



< Pi , 



where C k 



PI: 



0<a' k <C k , k=l, 
7(1 + 7^). 

The optimization problem given in Eq. ( fTTI i is still convex, 
since we have only added extra linear constraints into the 
problem of Eq. ©. However, it is now more difficult to 
compute an analytical solution. We propose the following 
algorithm to derive the optimal analytical solution. 

The Algorithm: 

1) Solve the problem without individual power constraints 
(i.e., Eq. (O) to obtain the solution as in Eq. dl3l : 
Set the index set JC e = {k\a' k ° pt > C k }. 



2) 



Set a' k opt = C k 
Set P to t — Ptot 



for k £ /C e ; 
E fe « c (l + 7,T 1 )C fc ; 
Remove a' k for k £ JC e from the design variable space. 
3) Repeat the previous two steps until JC e is empty in Step 
(1). 

To prove that the proposed algorithm leads to the global 
optimum, we only need to prove that in Step (2) we do not lose 



Opt 



for k £ IC e when we set a' k ° pt = C k for 



optimality of a' k 
k £ JC e . This can be shown easily by noticing that the objective 
function is monotonically decreasing with respect to a' k for all 
k. Since in Step (2) we assign the maximum allowable values 



to a' 



I opt 



for k 



£ IC e within the feasible region, there is no 
optimality loss, i.e., they are assigned the optimal values that 
minimize the objective function. 

To illustrate how the individual power constraints affect the 
outage performance, we take a six-node example. The other 
parameters are set the same as before. We plot the outage 
performance in Fig. [8] where each node has an individual 
power constraint P™- ax = (1.5P tot )/K in addition to the sum 
power constraint. From the curves we see that the diversity 
order is kept the same when we have individual power 
constraints, but the adaptive power gain over the equal-power 
case is reduced compared with the case where we only have 
a sum power constraint. 
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Fig. 8. Effect of Individual Power Constraints (6 nodes) 



(17) C. Practical Issues 



To obtain the desired transmit power levels for each sensor, 
we have assumed that the fusion center knows {(7^, s k ) : k = 
1,2,..., K}. This assumption is reasonable in cases where 
the network condition and the signal being estimated change 
slowly in a quasi-static manner. We have also assumed that the 
fusion center executes the optimization and then appropriately 
activates the sensors with their respective power levels. Our ap- 
proach is general for the estimation of a memoryless discrete- 
time random process 8(t). Due to the temporal memoryless 
property of the source and sensor observations, we can impose 
sample-by-sample estimation without significant estimation 
performance loss, but obtain important features such as easy 
implementation and minimum delay. 
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V. Minimum-Power Estimation with Zero Outage 

In the previous sections we have shown that with Rayleigh 
fading, non-zero outage is experienced when there are sum 
or individual power constraints. However, with K observation 
sensors, it is possible to achieve order-if estimation diver- 
sity for both equal-power and optimal power transmission 
strategies, while in the latter case we can further achieve 
certain adaptive power gain. In this section, we discuss a 
converse problem. Given a set of channel gains, which may 
be one realization of Rayleigh fading or may be simply 
caused by different transmission distances, we seek the op- 
timal power allocation scheme to minimize the total power 
consumption while satisfying a certain distortion requirement. 
If the distortion requirement is satisfied with minimum power 
consumption for each given channel realization, we call such 
a scheme as minimum-power estimation with zero outage. 

Based on the above discussion, the minimum-power esti- 
mation problem can be cast as 

K 



E p * 



nun > r k 

k=l 

s. t. Var[0] < D , 



where Do is the distortion target. According to Eq. CO and 
Eq. ©, given a set of channel SNR [s%, S2, ■ ■ ■ , sk] and a set 
of local observation SNR [71, 72, ■ • • , 7 k], the above problem 
is equivalent to 

K 

min ^2a' k (l+7' k 1 ) 



k=l 



K 



S. t. 



2 \ ^ 

a <> LL 7=1-7 



a k s k 



7t7 fc ot k s k + 



< Dn 



c4>0, k = !,•■■, K 



Unfortunately, this problem is not convex over the a' k 's. 
Let us define 



a k s k 



1 



Ik + 1 % 1 + ST 



Vfc. 



(18) 



Then the above optimization problem is equivalent to 

K 

mm E^+^fc 1 ) 



k=l 
K 

s. t. E r fc ^ 

fc=l 



1 



r k = 1 , a' k > 0, Vfc, 

"fk ' a' k s k 

where we see that the variable a' k can be completely replaced 
by a function of r k . From Eq. ( fT8l we obtain that 



Vfc. 



s k{r k -7 k ) 



Therefore, the problem can be transformed into a problem 
with variables {n, T2, ■ ■ ■ , rjf} shown as follows (noticing that 



ilk 



Sk /(i + 7 k 1 )Y 



1 



mm 



S. t. 



K 

E 

k 

K 

^ D 



K 



k=l 'k 



1 Sfe^fc 1 -Ik *) 

< r k < 7 k , 



Ik 1 



Vfc 



(19) 



fc=i 







which is convex over r k . The upper limit on r k in the second 



constraint is due to the fact that r k 



and 



7^+^— ~ a k s k 
i- 1 k 



> 0. 



Similar to solving Eq. Q in Section |W] the solution of 
Eq. ( fT9l ) can be stated as follows. As before, we rank the 
sensors according to 771 > 772 > ■ ■ ■ > Vk, and define 



9(k) 



where C(k) 



k 

E 

771—1 



7 



for 1< k < 



and D(k) 



k 

E 

m— 1 



7m 



(20) 



^f. Find 



K 1 such that g(Ki) > and g(K x + 1) < 0. If g(k) > 
for all 1 < k < K, we take K\ = K. Also define po = 
C(Ki) I 'D(Ki). Then the optimal solution is given as 



opt 



7k 




V fc 



(21) 



where (x) + equals when x < 0, and otherwise equals x. 
Hence, by definition, we have 



opt 



1 



•* (MP)" 1 - 7," 1 ) 
0, 



7fc ; / -1 



1 



k>K x 
fc < ifi- 



(22) 



Similar to the result in Section |IV] we see that the optimal 
strategy for minimum-power transmission with zero outage is 
to only allocate transmit power to sensors with better channel 
SNR and observation quality. Again, the figure of merit is r\ k = 
Sfc/(1 + 7 A 7 1 ). If a sensor has a rj k below certain threshold, 
it should be turned off to save power. Also not surprisingly, 
the solution in Eq. ( f22l is very similar to the one in Eq. (TTTTb 
except that the universal constants Co and po are different. In 
Eq. (fTTt . Co is determined by the power constraint, while in 
Eq. (l22l po is determined by the distortion requirement.. 

We now solve the optimization problem to show how 
much power we can save compared with an equal-power 
transmission strategy that satisfies the zero-outage distortion 
requirement with minimum sum power. We use the same 
setup as Section [HI] except that we now have 100 sensors 
and draw the average sum power consumption over different 
distortion target values. At each distortion target Do, the 
required sum power is averaged over 10,000 independent 
channel realizations. The result is shown in Fig. [9] where we 
see that the more strict distortion requirement we have (smaller 
Do), the more power we can save by deploying the optimal 
power allocation strategies, which is very important in energy- 
constrained sensor networks. 

Discussion of Maximizing Sensor Network Lifetime 
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Fig. 9. Average sum power vs. distortion requirements 



In our model we minimize the power sum J2 k Pk> 
i.e., the L 1 -norm of the transmission power vector P = 
[Pi, P2, ■ ■ ■ , Pk]- If the channel gain and the variance of the 
observation noise for each sensor are ergodically time-varying 
on a block-by-block basis, minimizing the L 1 -norm of P in 
each time block minimizes E{^2 k Pk}. In other words, it 
maximizes the lower bound of the average node lifetime that is 
defined as E{p k } w ^ -^o the battery energy available 

to each sensor (we assume that Eq is the same for all the 
sensors). This can be proved from the fact that 



1 E' 



En 



K ^ E{P k } 



> 



En 



which is based on Jensen's inequality [29]. However, when 
the channel is static and the variance of the observation noise 
is time-invariant, minimizing the L 1 -norm may lead some 
individual sensors to consume too much power and die out 
quickly. In this case minimizing the L°°-norm, i.e., minimizing 
the maximum of the individual power values, is the fairest for 
all sensors, but the total power consumption can be high. A 
good compromise is to minimize the L 2 -norm of P [30]. In 
this way, we can penalize the large terms in the power vector 
while still keeping the total power consumption reasonably 
low. Specifically, for the L 2 -norm minimization, the problem 
formulation becomes 



K 



mm 



fe=l 
K 



"fe -7* 



fc=i 



2 

t. Vn>J; 0<r fc < 7fc , Vk (23) 

^ U 



which is still a convex problem. Note that minimizing the 
various norms of P may not be the optimal thing to do given 
the fact that we are still lack of a unified definition of sensor 
network lifetime. A complete description of this problem is 
beyond the scope of this paper. 



VI. Conclusions 

For the distributed estimation of an unknown source, we 
have introduced a new concept of estimation outage and 
defined the corresponding estimation diversity, for the case of 
i.i.d. observation noise variances at different sensors and i.i.d. 
fading channels between the sensors and the fusion center. We 
have shown that the full estimation diversity (on the order of 
the number of sensor nodes K) can be achieved even with 
simple equal-power transmission strategies. We have further 
shown that the end-to-end distortion can be minimized under 
sum power constraints, where we gain certain adaptive power 
gain on top of the full diversity gain by turning off sensors 
with bad channels and bad observation quality. Moreover, we 
demonstrated that by considering an extra individual power 
constraint at each sensor, certain performance loss occurs. 
Minimum-power transmission with zero estimation outage has 
also been investigated, where significant power savings is 
achieved over equal-power transmission schemes. 

APPENDIX 
A. Derivation of in Eq. ([6]) 
We start with 

PtotSk PtotSk 



< 



Kil + % 1 ) 7,7 1 PtotSk + K(l+ 1 - 1 ) 

Ik r tot b k 

K(l + Ik 1 ) {^PtotSk + K(l + 7,- 1 )) 

Ik r tot b k 



K 2 



Thus we have the following inequalities: 

PtotSk 



< 



< 



Ik 1 PtotSk + K(l + Ik 1 ) 

PtotSk 



m+ik 1 )' 

Therefore, according to Eq. ©, we have 



K T3 K 

E ^totSk _ \ ^ Ik r tot s 



1 p2 2 

k 



< a 2 (Var[<9]) 



K 2 



-l 



A" 



< 



Y — 

£-1 K(l 



PtotSk 



f 1 K(l + 1 - 1 )' 



(24) 



It follows from the strong Law of Large Numbers 
(LLN) [27] that when K -> oo, 

K 



Ptot.Sk 



K 

E 



P t ot-B[ S i/(l+ 7l - 1 )], 



7fc r tot b k 



P{ ot E ( 7l - 1 S 2 ) , 



k=l 



K 



K -l p2 2 

and FtotSk 



k=l 



K 2 
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providing that E[sk/(1 + 7fe)] and E[j k s^] are finite. There- where we introduce the constant a = a^j '{D P tot ). This 



fore, 



lim of (Vax[§\) 1 = P t otE{ Sl /(l 
which implies that 

lim Var[0] = 



7i 



K->oo 



p tot i?[ Sl /(i + 7 r i )] 



(25) 



(26) 



implies that 



lim -— \ogP Do 

A ^oo A 



< lim 

K^oo if 



^H{it^<-}) 



B. Proof of Theorem 13.71 

Based on the result from the large deviation theory [27], we 
first establish the following lemma. 

Lemma 6.1: Suppose /3k : k = 1, . . . , K are i.i.d. random 
variables, and a is a constant satisfying a < E(/3k). Then for 
any if > 1, 



Prob 



< exp(-KIp(a)), 



(27) 



where /3(a) is the rate function of /3* which is defined to be 

/^(o) = sup(0a- log 7^(0)), (28) 

and Mp{6) is the moment generating function of (3k- Similarly, 
if a > E(f3 k ), then for any A' > 1, 



Prob 



< exp(-KIp(a)). 



Remark 6.2: The exponent Ip(a) in Eq. d28l is nonnegative 
(since Ip(a) > (9a — log Mp(9)) \e=o = 0) and convex over 
a (since it is the supremum of linear functions, and is hence 
convex). Also it holds that Ip(a) — if a = E((3k), Ip{a) is an 
increasing function of a for a > E((3k), and is an decreasing 
function of a for a < E((3k). In addition, Ip(a) leads to a 
tight bound in Eq. ( |27| i in the sense that 

lirn^ - 1 log (prob I A £ & < a = J^(o) (29) 

if i) Mp(0) is finite in some neighborhood of = and ii) 
Mp(Q) is differentiable in a neighborhood of #* where the 
supremum in Eq. ( 1281 is reached at 6*. More details can be 
found in [27] or Section III of [28]. 

We now continue to prove Theorem 13.11 From the second 
inequality in Eq. ( l24l i. we get 

Vd = Prob[var[<?] > £> } 

PtotSk 



> Prob 



Prob- 



Prob- 



' K 



On the other hand, from the first inequality in Eq. (I24l i we 
obtain 



K — >oc J\ 

Yxm -— log (Prob {Var[0] > A,} 
1 log f Prob 



> 



lim 

if->oo if 



_ \ ~> 7fc ^tot s k 



(a) 



l im _i log ( Prob {V PtotSfc , < 



lim log Prob 

K->oo if \ 



where (a) is due to the following lemma. 

Lemma 6.3: Suppose {vk ■ 1 < k < if}, {(3k ■ 1 < k < 
if} are two sets of i.i.d. random variables with bounded first 
moments, c\ is a constant satisfying c\ < E(vk), and E((3k) = 
b. Further assume that Vk and 0k both satisfy the regularity 
requirements described in Remark [6721 then 



lim 

K^oo if 



I log ^Prob|l^--^^A<ei}) 



lim log Prob 

K^oo if \ 



Proof: We prove A — B by proving A> B and A < B 
hold at the same time. First it is obvious that "<" holds in 
Eq. (l30l l since 



1^(1+7^) 



< 



Do 



1 ^ 



if ^ 1 

fc=i 

if- 

if ^ 1 



< 



Prob 



r tot 



1 fe=i fe=i j 



< a 



fc=i 



-7 fc 



r 1 * 

> Prob -£ 
I fc=l 
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We next show the inequality of the other direction. For any 
K e Z+ and e > 0, it holds that 

f 1 K 1 K 1 

Prob bE^"^E^ <c i 

I k=l k=l ) 

f 1 K 1 K 1 K 1 

= Prob ^E^-^E^< c i< ^E^ e 
i fc=i fc=i /c=i j 

f i K i K i K ] 

+Prob _^^-_^ A<Cl , _^/3 fc>e 
I fe=i fc=i fc=i J 

Prob |^E^ <c i + e }+ Prob |]f2E^ >e } 




< 



|^E^-^E^ <c i) 
i fe=i fe=i j 

Prob {i|„ 4<CI+ _^} 



(31) 



Lemma |6T| implies that 



l^oo ~ll l0S ( Pl ' 0b { l E «* < + ^= }) 



A" 

= lim J (ci + 

= 4(ci), 
where 7 a is the rate function of v^. Also 
1 



(32) 



lim log Prob 

a-^oo K 5 ^ 

= lim / b (c 2 ) 

C2— >00 



I. fc=l 



> 



(33) 



where /b is the rate function of /3fc. 

Therefore, it follows from Eqns d3lTl-(l33l that 



lim 



k lo s ( Prob { i E »* - i E a < c i } ) 

> /«(<*) 

= , lin L-^ lo s Prob ^E' 



fe=i 



The proof of the lemma is thus completed. I 
In summary, we have 



lim — 



Since s^,/(l + 7 fc 1 ) are i.i.d. random variables, we can apply 
Lemma 16.11 to calculate the rate function. 



Applying Lemma I6TT1 and assuming that the mild regularity 
conditions for M(9) described in the above remark hold, we 
obtain Theorem 13. II 
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